-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(chartdata): disable sqlparse calls for chart data requests to improve querying performance #19572
base: master
Are you sure you want to change the base?
Conversation
c6ea536
to
fbc7e5d
Compare
Thanks for the contribution @dvchristianbors . While I agree this performance hit is very problematic, we currently rely heavily on I looked into the original issue that you linked here, and noticed there seems to be open PRs on |
Thanks for your comments. I agree that this would be the best option. I forked sqlparse and merged the PRs which supposedly addressed the issues, however I cannot confirm that they did improve performance (at least for the tested use cases). I will further investigate the issue in sqlparse itself, and propose new changes as soon as I found a better solution. |
They seem very keen on not being used for validation. Also, the project had 3 days of activity since October 2020, so I wouldn't expect much from them or rely to heavily from a project that hat 200+ open issues. |
We also ran into this performance issue lately. Having some rather complex queries on a dfashboard drastically increases the dashboard load time, as the inefficiency of sqlparse keeps the gunicorn workers busy. This, in-turn, causes queries of the dashboard to be queued until a worker is free again. @villebro, given the fact that there is virtually no activity in the sqlparse project and considering @dvmarkusvogl's comment above, I feel that following the approach suggested in this PR is the best option we have. PS: We'll try patching sqlparse by virtue of andialbrecht/sqlparse#710 and report back... |
Update on our investigations:
Thank you @dvchristianbors for providing this patch. |
Thanks for letting us know. Did you test these changes with a recent branch? Our changes were made on a rather old version. |
Hey @dvchristianbors, I applied this patch manually as there were some incompatible changes in the codebase and it helped. Time from clicking chart refresh to getting the query sent to db dropped significantly. Also show query is much faster now but (as expected) it shows uglier query now. |
I was tempted to close the related issue as stale, but would love to know if anyone on this thread has interest in rebasing/rekindling this effort? |
Yes, I would still be interested to finish this feature. If a rebase next week is still fine. |
Perfectly fine. Thanks again! |
- slightly update test test_with_invalid_where_parameter_closing_unclosed__400
fbc7e5d
to
41581e9
Compare
Hello @rusackas, the PR was updated and is now ready for review. |
Looks like the pre-commit hooks need to be run to solve at least some (if not all) of the CI issues. |
Approving CI run 🤞 |
@dvchristianbors, @betodealmeida recently introduced [SIP-117] Improve SQL parsing which proposes that we use |
I replaced the usage of sqlparse with sqlglot for splitting up multiple sql statements. However, sqlglot does not include the formatting function that was specifically removed due to performance issues here. |
Note that it is possible to pretty-format a SQL query with sqlglot, but you need to know the dialect: @dvchristianbors do you mind leaving a TODO comment to tentatively re-add the SQL formatting once SIP-117 is merged? (I saw the PR is approved but has conflicts, I'll work on them. |
@dvchristianbors / @betodealmeida just checking in here... some folks on Slack were wondering if this is likely to merge. Looks like it needs a rebase at least, but also not sure if more needs to be done here regarding SIP-117. |
Our dashboards and charts use complex queries, resulting in high loading times. After investigating, we found that most of the time was consumed by sqlparse.parse and sqlparse.format. While searching for solutions, I came across this PR. Since this problem is not solved in the upstream as of now, I have put down a hacky solution that worked for us so that it can benefit others who may be in same boat as me . We're using the 4.0.2 Superset container image, but the patch in this PR wasn't directly compatible with our version. To address this, I used Following the PR's suggestion, I had sqlparse.format return the same SQL string which was passed to it without allowing to format anything. For sqlparse.parse, since it was called multiple times for the same query from various places, I implemented basic lru caching to avoid redundant calls. I've copied the patched app.py code block below. Sharing this in case anyone else encounters a similar issue and needs a quick, stable solution. The base app.py file for below patch is taken from 4.0.2 tag. The below code block is to be replaced with original Patched app.py
|
017b2f0
to
8b1a28e
Compare
8b1a28e
to
482dc3a
Compare
@rams3sh your app.py does the work for us on a really long and complex trino query 👍 |
fix(chartdata): improve querying performance for long
where ... in (...)
statements.SUMMARY
As described in andialbrecht/sqlparse#621, calling
sqlparse.format
andsqlparse.parse
functions with grouping queries will cause quadratic runtime, ending in very long computation times for queries with large statements, e.g., longIN
statements with a large number of keys.In Chart Data, the sqlparse functions are needlessly called, this has several reasons:
connectors/sqla/models/get_query_str_extended
, so if erroneos sql code is produced, we will still receive an error.sqlparser.format(sql, reindent=True)
to reindent will only have a minor improvement in readability while sacrificing computation time.Looking at the issue #19567 , querying times vary significantly depending on
IN
clause key size. Omitting theparse()
andformat()
functions from the code used by ChartData will solve this issue, without having any impediments.BEFORE/AFTER ANIMATED GIF
Before
After
TESTING INSTRUCTIONS
Please see issue #19567 for detailed steps to reproduce.
I also added a test case that showcases the computational penalty in this commit
ADDITIONAL INFORMATION